Computing 22, 367—373 (1979) Computing 

© by Springer-Verlag 1979 



Algorithm 39 
Clusterwise Linear Regression 

H. Spath, Oldenburg 
Received August 3, 1978 

Abstract — Zusammenfassung 

Algorithm 39, Clusterwise Linear Regression. The combinatorial problem of clusterwise discrete linear 
approximation is defined as finding a given number of clusters of observations such that the overall 
sum of error sum of squares within those clusters becomes a minimum. The FORTRAN implemen- 
tation of a heuristic solution method and a numerical example are given. 

Algorithmic 39. Klassenweise lineare Regression. Die kombinatorische Aufgabe der klassenweisen p 
diskreten linearen Approximation wird dadurch definiert, daB die Summe uber die Fehlerquadrat- 
summen innerhalb der Klassen minimiert wird. Die FORTRAN-Implementation eines heuristischen 
Losungsverfahrens und ein numerisches Beispiel werderi angegeben. 



1. Problem 

The usual form of linear least squares regression can be described as follows. Let be 
given m observations^ (y u a ik ) ...,m,fc=l, ..,,/)' with m>l Determine 

(x u . . . , Xj) such that 

E f*-E <mY (i) 

is minimized. 

But if the observations ought to be assigned to different unknown groups due to 
not collected, not collectable or unknown values for further independent variab- 
les, then, especially in the case of m^> /, the following problem formulation seems 
to be more adequate: 

Find a partition C x , . . C n of given length n of the observations, i.e. 

CjcM={l,,.,4 |C i |>0, C,nQ=0 for j+k, (j C /= M, 

and n vectors (xj t , . . . , x^) (j = 1, . . . , n) such that 

D(C u .., 9 C n )=i E(Cj) (2) 
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is minimized, where / i \ 2 

Eify-l, U- . (3) 

ieCj \ k = l / 

In order to ensure a necessary condition for the partial problems (3) to have a 
solution we have to add the side conditions 

\Cj\Zi a=X,^,n). (4) 

This problem formulation is related to the minimum variance criterion from cluster 
analysis [14] and is more generally given in [16]. A corresponding practical 
problem is given in [6]. 

2. Method 

As the number of possible partitions is too high to enumerate them within reason- 
able computing times for realistic sizes of m and J, one has to choose some 
heuristic solution method that gives a good but not necessary an optimal 
solution. As the so-called exchange method empirically behaves very well for 
similar structured cluster problems [14, 15], it looks attractive to modify it with 
regard to (4) for this problem: 

Step 1: Choose some initial partition with property (4). Set i: = i 0 ,i 0 eM (e.g. 
io=m)j 

Step 2: Set i:=i + i and reset i; = 1 if i>m. Let be ieC } . Then for | C } \>l '\t is 
examined if there are clusters C p with p f j such that 

E (C p u {i})+E (C j -{i})<£ (C p )+E (Cj). (5) 

If this is true, then let r be an index such that the reduction of (3) corresponding to 
(5) becomes a maximum. In this case redefine 

Cf^Cj-ii}, C r : = C r u{i}. 
In all other cases return to step 2. 

Step 3: Repeat step 2 as long as the objective function (3) can be reduced, e.g. 
as long as i has been increased m times without (5) being true. 

This method is stepwise optimal and works sequentially on the observations. Its 
result depends on the initial partition. As with cluster problems [14] it happens 
empirically that after 6 to 10 passes through the observations one obtains a 
useful solution. It is recommended to start with several initial partitions and to 
select the best solution. 

3. Algorithm 

The above method is implemented in the following subroutine CWDLR whose 
formal parameters are described precisely in the comment cards of the list. An 
initial partition C l9 „.., C n has to be given via an integer vector (p u ...,pj with 
p.=j if observation i shall belong to cluster C y For solving the problems of type (3) 
the very reliable subroutine HFTI from [8] is used. If the rank of the matrix 
(a^) (i e C jy fc= 1, 0 should be less than I, then HFTI gives a minimal length 
solution. 



Clusterwise Linear Regression 



List 

SUBROUTINE CWOLR (A,M0A#M, L/Y/P/X/NDX,N,D,B, 

♦ Q/ IP> AA> YY/ YA/ YB, P# 0/ 1 A ) 

THE SUBROUTINE PARAMETERS ARE DEFINED AS FOLLOWS. 

ACMDA/L) THE ARRAY A HAS TO CONTAIN THE GIVEN 

<M#L)~MATRIX (M<*MDA) 
Y(M J THE ARRAY Y HAS TO CONTAIN THE GIVEN 

M-VECTOR 

P(M) THIS M-VECTOR (OP TYPE INTEGER ) INIATIALLY 

HAS TO CONTAIN A FEASIBLE PARTITION OF 
LENGTH N VIA PU)*J CI«X#.t,jM>. 
ON OUTPUT P CONTAINS THE FINAL PARTITION 
OBTAINED BY THE EXCHANGE METHOD 

XINDX/L) HILL CONTAIN THE tN,L>-HATRIX <N<*NDX) OP 
SOLUTION PARAMETERS/ !.6.'XCJ>{ IS THE 
POUND PARAMETER SET FOR THE J-TH CLUSTER 
<J»X>»«*jN> 

D THE OBTAINED VALUE OF THE OBJECTIVE FUNCTION 

E(N) ECU) WILL CONTAIN THE ERROR SUM OF SQUARES 

FOR THE J-TH CLUSTER 

THE OTHER ARRAYS ARE FOR WORKING SPACE. THEIR 
LENGTHS CAN BE SEEN PROM THE DIMENSION STATEMENTS r 

IA THE EXCHANGE STEPS ARE STARTED AT THE ROW 

NUMBER IA * X < MOD M). NORMALLY ONE SETS IA-O, 

THE VERY RELIABLE SUBROUTINE 'HFTI » IS USED TO SOLVE ALL 
LEAST SQUARES PROBLEMS* A LIST CAN BE FOUND INl 
LAMSON/HANSON/ SOLVING LEAST SQUARES PROBLEMS/ 
PRENTICE HALL 1974. 

DIMENSION A(MDAsL)' Y<M)/ X<NDX/L)#ECN>/ 

* AA(MDA/L)/YY(M)jYA(L)/YB(L>/P{M)#G<M)^RNDRM<1) 
INTEGER P<M)/Q<NMP<L)>H,U/V/R 

LOGICAL I TRUE 
TAU-l.E-10 
ITRUEo.TRUE ■ 

IT«0 
0*0. 
X DO 10 J«I/N 

IFUTRUE) GOTO 2 

IFU.NE.R) PUJ«J 

IPU.EQ.R) P(X)«0 

z v«b 

DQ 4 H*l# M 

IP(P(H).NB.J) GOTO 4 
V«V+I 

YY(V>*Y<H) 
DO 3 K°l/L 

AA(V/K>pA<H,IO 
3 CONTINUE 
A CONTINUE 

rF(V.LT.L) RETURN 



CALL HFTI (AA*MDA/VA/YY/l/X/TAU/KRANK#RNaRM/P#fi#IP> 

FF*RNORM( I )*RNORM( 1 ) 

IF(.NOT.ITRUE) GOTO 6 

QU)»V 

EU)«FF 

0*D*FF 

DO 3 K«l/L 

X(J/K)«YY<K) 
3 CONTINUE 

GOTO XO 
6 IFU.NE.R) GOTO 8 

AF*FP 

DO 7 KbI/L 

YA(K)»YY<K) 
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List (Continuation) 

7 CONTINUE 
GOTO 10 

8 IF(FF.GT*BP) GOTO 10 
BF-FF 

U«J 

00 9 K*UL 

YB(K)*YY<K) 

9 CONTINUE 
10 CONTINUE 

IFt.NOT.ITRUE) GOTO U 
IF(N.lE.i) RETURN 
ITRUE». FALSE* 
GOTO 14 
IX BU»BF-E(U) 
RA«E<R>-AP 
IFCBU.LT.RA) GOTO, 12 
IT«IT*1 
P(I>«R 
GOTO 14 

12 JT«0 
0(R)«AF 
E(U>*BP 
O-D-RA+BU 

pm*u 

Q<UV*Q(U)U 
00 13 K*1>L 

X(R,K)«YA<K> 

X<U,K)*YB<K> 

13 CONTINUE 

14 I«I + i 
IF(X.OT,M) IpI-H 
IFHT.EQ.H) RETURN 
R»P<1> 

IP{QCR).LE.U GOTO 14 

BF*1*E50 

GOTO 1 

END 



4, Numerical Example 

For the data from Table 1 it has been started with the standard initial partition 
p, = l+mod(i-l,w) and i 0 =m = 20. 







Table 1 






i 


y i 


a il 


9 )2 


a l3 


1 


960,0 


60,6 


18,0 


1.0 


2 


830,0 


220.0 


0,0 


1.0 


3 


1260,0 


180,0 


14.0 


1.0 


4 


610,0 


80,0 


6.0 


1.0 


5 


590,0 


120.0 


1.0 


1.0 


6 


900.0 


100.0 


9.0 


1.0 


7 


820.0 


170.0 


6.0 


l.o 


8 


880. 0 


110.0 


12.0 


1.0 


9 


860,0 


160.0 


7.0 


1.0 


10 


760,0 


230.0 


2.0 


1.0 


11 


1020,0 


70*0 


17.0 


1.0 


12 


1080.0 


120. 0 


19.0 


1.0 


13 


960,0 


240.0 


7.0 


1.0 


14 


700,0 
800. 0 


160.0 


0.0 


1.0 


13 


90.0 


12.0 


1.0 


16 


1130,0 


110,0 


16.0 


1.0 


17 


760,0 


220.0 


2.0 


1.0 


18 


740,0 


110.0 


6.0 


1.0 


19 


980*0 


160.0 


12.0 


1.0 


20 


B00.O 


80.0 


13.0 


1.0 
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For n—2 the corresponding final partition is 

Ci = {5, 6, 7, 9, 10, 11, 12, 13, 16, 17, 18, 19} 
C 2 = {1,2,3,4,8, 14,15,20} 

and for n =3 it is 

^ = {5,6,7,9,13, 16,19} 
C 2 = {10, 14,15, 17, 18,20} 
C 3 = {1,2,3,4,8,11, 12}. 

The values for E (Cj) and x Jk (k = 1, 2, 3) are given for n = 1, 2, 3 in Table 2. 



Table 2 


n 






x n 






1 


1 


993503 


2.14 


32.57 


285.46 


2 


1 


100133 


1.24 


33.40 


417.58 




2 


16850.1 


3.60 


34.21 


84.56 


3 


1 


4798.8 


0.99 


35.06 


450.41 




2 


3443 


0.52 


9.88 


621.23 




3 


4959.4 


3.42 


38.89 


82.53 



We have had a reduction for the objective function for n=2 from D= 93225.1 to 
D= 26863.4 and for n= 3 from D = 58752.0 to D = 10102.5. The number of passes 
through the observations was three for n=2 and for n = 3. For n=4 we have got 
D = 1810.0, but also one cluster C 4 with | C 4 |=/ = 3, that is E (C 4 )=0. 

The computing time on a TR 440 computer for this example together with those 
for two other cases can be found in Table 3. 

Table 3 

m I n time in seconds 



20 3 1,2,3 4.8 
20 4 1,2,3 12.6 
64 4 1,2,-. .,7 4113 

The partitions given above are not optimal ones. Starting in Step 1 with i 0 = 3 k 
(fc = l,...,6) better solutions are found. In the case of n=2 the best one is 
obtained for fc=6 (D = 21404.6) and in the case of n = 3 for fc = 5 (D = 4930.5). 

5. Remarks 

As indicated in [16] it is possible to reduce computing times (perhaps drastically) 
by calculating the solution vectors for C ; u {i} and C, - {i} from that one for C. 
by updating processes. More stable but even more cumbersome updating methods 
are discussed in [3, 7, 8, 11, 12, 17], 
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If one would choose in (3) the L x norm (sum of absolute deviations) or the norm 
(maximum absolute deviation) instead of the squared L 2 norm (sum of squared 
deviations) then solutions are normally obtained through linear programming 
techniques [1,2,4, 13]. In this case, too, updating techniques are available [5] 
but would have to be worked out for this special purpose. 

Now in the author's opinion and corresponding to his experience with real life 
problems from economics values of p with pe(l,2) — say pwl.3 — are more 
adequate than p — l or p = 2. Choosing the L p norm (1 <p< oo, p=£2) and cor- 
responding numerical methods [9, 10, 13, 18], similar updatings seem not to be 
possible. Thus in this case the structure of CWDLR can be taken over and HFTI 
has to be replaced by a subroutine like LP from [13]. 
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